Accelerating communication for parallel programming models on GPU systems
نویسندگان
چکیده
As an increasing number of leadership-class systems embrace GPU accelerators in the race towards exascale, efficient communication data is becoming one most critical components high-performance computing. For developers parallel programming models, implementing support for GPU-aware using native APIs GPUs such as CUDA can be a daunting task it requires considerable effort with little guarantee performance. In this work, we demonstrate capability Unified Communication X (UCX) framework to compose layer that serves multiple models Charm++ ecosystem: Charm++, Adaptive MPI (AMPI), and Charm4py. We performance impact our designs microbenchmarks adapted from OSU benchmark suite, obtaining improvements latency up 10.1x 11.7x AMPI, 17.4x also observe increases bandwidth 10x 10.5x show potential on real-world applications by evaluating proxy application Jacobi iterative method, improving 12.4x 12.8x 19.7x
منابع مشابه
Accelerating parallel particle swarm optimization via GPU
This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date...
متن کاملParallel Programming Models for Dense Linear Algebra on Heterogeneous Systems
We present a review of the current best practices in parallel programming models for dense linear algebra (DLA) on heterogeneous architectures. We consider multicore CPUs, stand alone manycore coprocessors, GPUs, and combinations of these. Of interest is the evolution of the programming models for DLA libraries – in particular, the evolution from the popular LAPACK and ScaLAPACK libraries to th...
متن کاملGPU-Vote: A Framework for Accelerating Voting Algorithms on GPU
Voting algorithms, such as histogram and Hough transforms, are frequently used algorithms in various domains, such as statistics and image processing. Algorithms in these domains may be accelerated using GPUs. Implementing voting algorithms efficiently on a GPU however is far from trivial due to irregularities and unpredictable memory accesses. Existing GPU implementations therefore target only...
متن کاملGeometric Programming for Communication Systems Geometric Programming for Communication Systems
Geometric Programming (GP) is a class of nonlinear optimization withmany useful theoretical and computational properties. Over the last fewyears, GP has been used to solve a variety of problems in the analysisand design of communication systems in several ‘layers’ in the commu-nication network architecture, including information theory problems,signal processing algorithms, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Parallel Computing
سال: 2022
ISSN: ['1872-7336', '0167-8191']
DOI: https://doi.org/10.1016/j.parco.2022.102969